pacman::p_load(readxl,reshape,survival,survminer,ggplot2,cowplot,dplyr,My.stepwise,haven,tidyr,knitr,lubridate,naniar,table1, Hmisc,skimr,psych,lavaan)
Structural Equation Model (SEM) si a general analytic framework that include t-test, ANOVA, MANOVA, multiple regression, path analysis,factor analysis, growth trajectories.
Predictor, Outcomes, Mediators, Factors. Use Path Diagrams for specification
possible to obtain all model parameter estimates?
Model-Implied Moments: Model is a set of parameters. Population Moments include means and covariance. Model-Implied Moments is moment matrix implied by the model, which is similar to y and \(\hat{y}\). can each parameter be uniquely expressed using observed information? Extent to which there is sufficient known information to infer unknown values. Over-identified is observed > estimated. population moments are \(\Sigma\) and \(\mu\). Sample moments are S and m. population model-implied moments are \(\Sigma(\theta)\) and \(\mu(\theta)\)
How to obtain best estimates. ML is asymptotically unbiased,consistent, and maximally efficient.
How well does the model fit the data. Relative fit of competing models can be compared using likelihood ratio tests (chi-square difference tests). Regressions are just-identified, uses the same number of parameters to exactly reproduce the observed moments. So the model necessarily fits perfectly and is said to be saturated. Most SEMs, are over-identified, resulting testable restrictions on model-implied moments.
Can the model be improved? How are modifications identified
Which effects are significant? Which are substan
read Deviant Peer Addiliations dataset The header argument of the read.table function is set to FALSE here to indicate that there are no variable names at the top of the data file.
afdp <- read.table("data/afdp.dat", header = FALSE ,col.names =c("id","coa","age","gen","stress","emotion","negaff","peer"))
psych::describe(afdp)
peer adolescent report on peer substance use and peer tolerance of use
coa parent report of alcoholism diagnosis where 0=non-alcoholic and 1=alcoholic
gen gender 0=girl 1=boy
age age measured in years at assessment
stress self report measure of uncontrollable negative life stressful events
emotion self report measure of temperamental emotional expressiveness
negaff self report measure of depression and anxiety
peer = α
+ γ coa + ζ
First, we need to create what’s called a model syntax object.Then fit the model using the sem function, placing the results into the data object fit. Note that we include the meanstructure argument with the sem function so that we will obtain an estimate of the regression intercept.
model.1 <- 'peer ~ coa'
summary(sem(model.1, data=afdp, meanstructure=TRUE),standardized=TRUE,rsquare=TRUE)
lavaan 0.6-12 ended normally after 9 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 3
Number of observations 316
Model Test User Model:
Test statistic 0.000
Degrees of freedom 0
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
peer ~
coa 0.176 0.060 2.956 0.003 0.176 0.164
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.peer 0.298 0.043 6.884 0.000 0.298 0.554
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.peer 0.280 0.022 12.570 0.000 0.280 0.973
R-Square:
Estimate
peer 0.027
we are using ML (maximum likelihood) estimation with the nlminb optimizer (this is a built-in optimizer in R for finding the minimum of a function).
It is worth pausing here to note that lavaan counts parameters differently than we described in lecture. For this model, we would normally count five parameters, the mean and variance of COA, the intercept and slope of the regression of Peer on COA, and the residual variance of Peer. But lavaan does not count the mean or variance of the predictor, COA, as parameters of the model. Nor does it count covariances among predictors (though there are no such covariances in the current model) as free parameters. Fortunately, lavaan also does not count these parameters when determining the number of observed moments, so the degrees of freedom for the test statistic work out regardless (e.g., here it neither counts the mean and variance of COA as observed moments nor does it count them as estimated moments, leaving a net difference of zero when calculating the degrees of freedom).
This output also provides some information about model fit. Multiple regression models are just identified (i.e., every piece of information provided by the sample is ‘used up’ to estimate model parameters so that no degrees of freedom remain). Therefore, the model fits the data perfectly and it is not worthwhile to interpret the test statistic (this and other fit indices will be discussed in later sections).
The output is subdivided into three parts, regressions, intercepts, and variances. In the Regressions section, the peer ~ coa coefficient is the slope parameter estimate γ . In the ˆ Intercepts section, the coefficient listed for .peer is the intercept α . Finally, in the Variances section, the coefficient listed for .peer is the represents represents ψ , the estimated variance of ζ i (i.e., residual variance). The Estimate column lists is the ML point estimate of each parameter. The Std.Err column gives the standard error for each estimate (where these are computed using the expected information matrix, as noted above). Next, the z-value column reports the Wald z-statistic for the null hypothesis test that the parameter is significantly different from zero in the population, and the P(>|z|) column reports the p-value associated with this z-statistic.
Here, we see that the average non-COA has a score of .298 on peer and the average COA has a score that is .176 units higher than non-COAs on peer (.298 + .176 = .474). Both the intercept and slope are significantly different from zero. Finally, the variance in deviant peer association that is not explained by COA status is .280. This indicates that, although COA status is a significant predictor of peer, COA status does not fully account for affiliation with peers.
Because peer is not on an intrinsically meaningful scale, we cannot easily interpret the differences among the regression parameters or residual variances. To better interpret these results, we can request standardized estimates. Within lavaan, several methods of standardization are available within the summary function. The first type, obtained by including standardized=TRUE, is the typical fully standardized solution. In this case, both the independent and dependent variables are standardized to have a variance of 1 (note that lavaan does not, however, also make the means 0, as one might expect).
The Std.lv column refers to a solution in which only the latent variables are standardized and not the observed variables. Since the current model contains no latent variables, this is of little use here. The Std.all column contains the fully standardized estimates of interest, in which all variables (observed and latent) are standardized to have a variance of one. Thus, we say that a one standard deviation increase in COA status is associated with a .164 standard deviation increase in deviant peer associations.
It may be more useful to retain the original scaling of COA while standardizing peer. If we include std.nox=TRUE in the summary function, as shown below, we will obtain estimates in which the latent variables and endogenous observed variables are standardized but the exogenous observed variables are left in their raw scale. These are commonly known as partially standardized estimates.
Finally, we can also calculate how much variance is explained in peer by coa by requesting that lavaan output the r-squared statistic. R-Square is the estimated proportion of variance in peer that is accounted for by the models (i.e., COA, the only predictor included in this model). We have explained less than 3% of the total variance in peer with the COA predictor. Note, however, that measured variable regression models assume that no measurement error is present. If measurement error is present, the estimated relationship among the variables in the model may be attenuated and the R 2 value will also be underestimated.
peer
= α + γ1coa + γ 2gen + γ 3age + ζ
we have specified that peer is regression on coa, gen, and age. Note that, again, we use the regression operator ~ to indicate the regression. Now that our model includes multiple predictors, we use the + operator in between each predictor.
summary( sem('peer ~ coa + gen + age', data=afdp, meanstructure=TRUE), std.nox=TRUE, rsquare=TRUE)
lavaan 0.6-12 ended normally after 19 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 5
Number of observations 316
Model Test User Model:
Test statistic 0.000
Degrees of freedom 0
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.nox
peer ~
coa 0.210 0.054 3.858 0.000 0.210 0.391
gen -0.048 0.055 -0.877 0.381 -0.048 -0.089
age 0.151 0.019 7.993 0.000 0.151 0.281
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.nox
.peer -1.610 0.250 -6.453 0.000 -1.610 -2.999
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.nox
.peer 0.232 0.018 12.570 0.000 0.232 0.804
R-Square:
Estimate
peer 0.196
We see that gen is not a significant predictor of peer when COA and age are also included in the model (p = .381). However, age is significantly related with peer such that older adolescents are more likely to associate with deviant peers. We estimate that a one-year increase in age is associated with a .151-increase in deviant peer association ratings. COA remains a significant predictor of peer, and the regression parameter estimate (i.e., it has increased from .18 to .21) has not changed much from the single-predictor model to the multiple-predictor model relative to its standard error (0.06). The stability of this coefficient reflects the fact that COA is not highly correlated with gen or age.
To better interpret these results, and to get a sense for the relative contribution of each predictor, we requested the partially standardized solution. We prefer this method of standardization for this example because all three predictors have natural scales. As can be seen in the Std.nox column of output, after controlling for age and gender, COAs affiliate with deviant peers about .391 standard deviation units more than non-COAs. Each additional year of age is associated with a .281 standard deviation increase in self-reported affiliation with deviant peers.
From the R-square output, we can see that the multiple predictor regression model explains more of the variance in peer than the single predictor model. Age and gender account for an additional 17% of the variance in peer, over and above COA status. Still, approximately 80% of the variance in peer remains unexplained by the variables in our model.
As a default, lavaan allows all exogenous variables to covary but it fixes the residual covariances among endogenous variables to zero. The residual terms of stress and emotion are freed to covary within the section of the code marked #covariances. To designate a covariance (or variance) we use the ~~ operator. Thus, stress ~~ emotion tells lavaan to include a residual covariance for stress and emotion.
The final three lines request the model-implied covariance matrix (fitted(fit)) and the raw and standardized residual matrices (resid(fit, type=“raw”) and resid(fit, type=“normalized”), respectively). These residuals represent the differences between the observed covariance matrix and the matrix that is implied by the structure of the model.
pathmodel.1 <- '
#regressions
stress ~ coa + gen + age
emotion ~ coa + gen + age
negaff ~ stress + emotion
peer ~ negaff
#covariances
stress ~~ emotion
'
fit <- sem(pathmodel.1, data=afdp, meanstructure=TRUE)
summary(fit, standardized=TRUE, rsquare=TRUE)
lavaan 0.6-12 ended normally after 40 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 18
Number of observations 316
Model Test User Model:
Test statistic 81.173
Degrees of freedom 8
P-value (Chi-square) 0.000
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
stress ~
coa 0.451 0.072 6.223 0.000 0.451 0.331
gen -0.016 0.073 -0.215 0.830 -0.016 -0.011
age 0.002 0.025 0.078 0.938 0.002 0.004
emotion ~
coa 0.110 0.056 1.963 0.050 0.110 0.110
gen -0.048 0.056 -0.843 0.399 -0.048 -0.047
age -0.027 0.019 -1.374 0.170 -0.027 -0.077
negaff ~
stress 0.246 0.078 3.134 0.002 0.246 0.175
emotion 0.553 0.106 5.206 0.000 0.553 0.290
peer ~
negaff 0.176 0.030 5.892 0.000 0.176 0.315
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.stress ~~
.emotion 0.112 0.019 5.896 0.000 0.112 0.352
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.stress 0.687 0.333 2.066 0.039 0.687 1.010
.emotion 2.341 0.258 9.088 0.000 2.341 4.663
.negaff 1.527 0.207 7.373 0.000 1.527 1.594
.peer -0.118 0.091 -1.298 0.194 -0.118 -0.220
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.stress 0.412 0.033 12.570 0.000 0.412 0.890
.emotion 0.247 0.020 12.570 0.000 0.247 0.979
.negaff 0.778 0.062 12.570 0.000 0.778 0.848
.peer 0.260 0.021 12.570 0.000 0.260 0.901
R-Square:
Estimate
stress 0.110
emotion 0.021
negaff 0.152
peer 0.099
We report fully standardized effects (from Std.all column generated by the first summary function call) except for coa, gen, and age, for which we report partially standardized effects (from Std.nox column generated by the second summary function call). Recall that only the outcome variables are standardized in computing the partially standardized effects, which makes them particularly useful for examining the effects of coding variables (e.g., coa and gen) or predictors with natural metrics (e.g., age).
These partially standardized parameter estimates represent the expected change in standard deviation units in y given a one unit increase in x. By comparison, the fully standardized estimates are computed by standardizing both the predictors and the outcome variables so that parameter estimates represent the expected change in standard deviation units in y given a one standard deviation increase in x. These are most useful for interpreting effects when neither predictors nor outcomes have natural scales and when gauging relative effect sizes across predictors on different scales. For covariance parameters, fully standardized estimates can be interpreted as correlations.
From the R-Square section of output, we can also see that only a modest amount of the total variance in any of these variables has been explained by the model.
resid(fit, type="normalized")
$type
[1] "normalized"
$cov
stress emotin negaff peer coa gen age
stress 0.000
emotion 0.000 0.000
negaff 0.000 0.000 0.000
peer 2.657 0.395 0.000 0.000
coa 0.000 0.000 -0.163 2.685 0.000
gen 0.000 0.000 -1.580 -1.548 0.000 0.000
age 0.000 0.000 3.229 8.015 0.000 0.000 0.000
$mean
stress emotion negaff peer coa gen age
0 0 0 0 0 0 0
The normalized residuals are rescaled to follow a standard normal distribution and values exceeding plus or minus two are often taken as potentially meaningful in magnitude. The normalized residuals for this model suggest that the hypothesized structure is doing a poor job in reproducing the observed covariances between several variables, most notably between age and peer, between age and negaff, and between stress and peer. We will explore methods for more formally assessing overall model fit in the next chapter.
We obtain measures of fit, appropriately enough, by including fit.measures=TRUE
summary(fit, fit.measures=TRUE)
lavaan 0.6-12 ended normally after 40 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 18
Number of observations 316
Model Test User Model:
Test statistic 81.173
Degrees of freedom 8
P-value (Chi-square) 0.000
Model Test Baseline Model:
Test statistic 251.027
Degrees of freedom 18
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.686
Tucker-Lewis Index (TLI) 0.293
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -1158.938
Loglikelihood unrestricted model (H1) -1118.351
Akaike (AIC) 2353.875
Bayesian (BIC) 2421.479
Sample-size adjusted Bayesian (BIC) 2364.387
Root Mean Square Error of Approximation:
RMSEA 0.170
90 Percent confidence interval - lower 0.138
90 Percent confidence interval - upper 0.205
P-value RMSEA <= 0.05 0.000
Standardized Root Mean Square Residual:
SRMR 0.085
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Regressions:
Estimate Std.Err z-value P(>|z|)
stress ~
coa 0.451 0.072 6.223 0.000
gen -0.016 0.073 -0.215 0.830
age 0.002 0.025 0.078 0.938
emotion ~
coa 0.110 0.056 1.963 0.050
gen -0.048 0.056 -0.843 0.399
age -0.027 0.019 -1.374 0.170
negaff ~
stress 0.246 0.078 3.134 0.002
emotion 0.553 0.106 5.206 0.000
peer ~
negaff 0.176 0.030 5.892 0.000
Covariances:
Estimate Std.Err z-value P(>|z|)
.stress ~~
.emotion 0.112 0.019 5.896 0.000
Intercepts:
Estimate Std.Err z-value P(>|z|)
.stress 0.687 0.333 2.066 0.039
.emotion 2.341 0.258 9.088 0.000
.negaff 1.527 0.207 7.373 0.000
.peer -0.118 0.091 -1.298 0.194
Variances:
Estimate Std.Err z-value P(>|z|)
.stress 0.412 0.033 12.570 0.000
.emotion 0.247 0.020 12.570 0.000
.negaff 0.778 0.062 12.570 0.000
.peer 0.260 0.021 12.570 0.000
We can request that lavaan suggest changes to our model based on the expected change in model chi-square if a fixed parameter were freed; these are the modification indices (MIs). We can obtain MIs with the modindices function. The sort. argument is set to TRUE so that lavaan will present the largest MIs first. Additionally, the minimum.value argument is set to suppress printing of any MIs less than 10.
modindices(fit, sort.=TRUE, minimum.value=10)
Here, lhs stands for “on the left-hand side of the operator” and rhs stands for “on the righthand side of the operator.” The operator, denoted op, for these parameters is ~, indicating these are regression slopes. mi denotes the expected improvement in the model chi square test statistic if the modification is accepted. epc denotes “expected parameter change (EPC)” and gives the expected parameter value if the modification is implemented (since it is changing from zero). sepc.lv re-scales the EPC by standardizing any latent variables in the model, sepc.all re-scales the EPC by standardizing all variables (observed and latent), and sepc.nox re-scales the EPC by standardizing all variables except exogenous predictors. Here, sepc.lv is equivalent to the EPC because there are no latent variables in the model. It can be helpful to rely on standardized EPC values in order to get a sense of the relative magnitude of each potential modification.
Final Model
pathmodel.5 <-'
#regressions
stress ~ coa + gen + age
emotion ~ coa + gen + age
negaff ~ stress + emotion + age
peer ~ negaff + age + coa
#covariances
stress ~~ emotion
'
fit.5 <- sem(pathmodel.5, data=afdp, meanstructure=TRUE)
summary(fit.5, fit.measures=TRUE)
Mediator explains why or how one variable exerts an effect on another. Indirect effect point estimate is product of coefficients in chain. the CI is calculated by delta method and bootstrapping.
Significant links in a mediational pathway are not sufficient to infer mediation. In order to formally test the full mediation effect, we need an inferential test of the entire specific indirect effect in question. Here, we want to know whether the specific mediational pathway of COA status on deviant peer affiliation via stressful events and negative affect is statistically significant, whether the specific mediational effect of coa on peer via emotionality and negative affect is significant, and whether the overall indirect effect of coa on peer is significant. Finally, we would like to have an estimate of the total effect of coa on peer considering both direct and indirect pathways.
we’ve added parameter labels for the paths involved in the direct and indirect effects of coa on peer. For instance, in the line stress ~ c_s*coa + gen + age, we have supplied the label c_s for the effect of coa on stress. Likewise, the other paths involved in computing the effects of interest have also been labeled. To help keep track of things, our labels consist of the first letter of the predictor, an underscore, and first letter of outcome, but any labels are fine. We can now refer to these labels when defining direct, indirect, and total effects.
Following the core model specification, we have new lines to define the direct, specific indirect, total indirect, and total effects. This is done using the “define” operator :=. For instance, ind_s := c_ss_nn_p requests that lavvan compute the quantity ind_s as the product of the three paths previously labeled c_s, s_n, and n_p.
effects.5 <-'
#regressions
stress ~ c_s*coa + gen + age
emotion ~ c_e*coa + gen + age
negaff ~ s_n*stress + e_n*emotion + age
peer ~ n_p*negaff + age + c_p*coa
#covariances
stress ~~ emotion
#direct effect
dir := c_p
#specific indirect effects
ind_s := c_s*s_n*n_p
ind_e := c_e*e_n*n_p
#total indirect effect
tot_ind := c_s*s_n*n_p + c_e*e_n*n_p
#total effect
tot := c_s*s_n*n_p + c_e*e_n*n_p + c_p
'
set.seed(62973)
fit.5b <- sem(effects.5, data=afdp, meanstructure=TRUE, se="bootstrap", bootstrap=1000)
Note that we have used the set.seed function to set the random number seed for selecting the bootstrapped samples so that we can reproduce exactly these same results each time we run the script.
Then, in the sem function, we specified se=“bootstrap”, bootstrap=1000 to request that lavaan compute bootstrapped standard errors with 1000 bootstrapped samples. We actually don’t care much about the standard errors. What we really want are bootstrapped confidence intervals, which are generated in the parameterEstimates function with the boot.ci.type argument. The parameterEstimates function produces a simple list of the parameter estimates from the model. By including the boot.ci.type argument, we request that this output include 95% boostrapped confidence intervals.
Note that there exist multiple methods for calculating bootrapped CIs. The percentile method is probably easiest to understand – the 1000 boostrapped estimates are ordered by magnitude and the 25 th estimate (2.5 th percentile) and 975 th estimate (97.5 th percentile) are taken as the confidence limits, yielding a 95% confidence interval. This method, which is widely available in many software programs, is obtained via boot.ci.type = “perc”. That is the method we used in the lecture notes. A slightly preferable approach, however, is to use bias-corrected bootstrapped confidence intervals, and these are readily available in lavaan. To obtain the bias corrected confidence intervals, we simply use boot.ci.type = “bca.simple”. The biascorrected intervals are shown in the output below. In terms of null hypothesis testing, the same conclusions are generated with the bias-corrected CIs as with the percentile CIs presented in lecture, but there are some slight differences in the specific values of the confidence limits.
parameterEstimates(fit.5b, boot.ci.type = "perc")
parameterEstimates(fit.5b, boot.ci.type = "bca.simple")
The total effect of coa on peer is equal to .209 and represents a combination of the direct effect (.185) and the total indirect effect (.024). The 95% CI is equal to .094 and .306; because this does not contain zero, the total effect is deemed to be significant.
Examining the mediational pathways, we see that the specific indirect effect coa→stress→negaff→peer, labeled ind_s to indicate it is the indirect effect through stress, is equal to .015 (95% CI=.005, .036) and is significant (does not include zero). The biological pathway from coa→emotion→negaff→peer, labeled ind_e to indicate it is the indirect effect through emotion, is equal to .009 (95% CI=0, .022) and thus does not reach statistical significance (because the lower CI is equal to zero).
Latent variable also called: unmeasured variables, constructs, factors, true scores, unobserved variables, unobserved heterogeneity. Latent variable is used for: factors in factor analysis, latent variables in structural equation models, basis curves in latent curve analysis, latent classes (class/cluster/mixture models), latent traits, underlying propensity variables for censored/limited dependent variables, missing data, random effects in a multilevel or mixed model.
Goals of Factor Analysis: determine the underlying structure behind a set of observed measures.
Exploratory factor analysis is primarily data-driven: all observed variables regressed on all factors and the regression slopes. Confirmatory factory analysis is formally testing a theoretical factor structure, number and nature of latent variables specified by theory, relationships between indicators and latent variables specified by theory.
Outcome variable is regressed on the predictors. As the target predicted outcome y depends on the predictor x, you can say that “is regressed on” means “is dependent on”. The word “regressed” is used instead of “dependent” because we want to emphasise that we are using a regression technique to represent this dependency between x and y.
\(\Theta\) is residual covariance matrix
Communality. \(R^2\) for regression of the indicator on the set of common factors: items with high communalities are good items, as most of their variance reflects the influence of the common factors \[ h^2=1-\frac{VAR(\varepsilon)}{VAR(y)} \]
Conducting Process
Setting the scale of latenet variables
CFA Example dataset description— The data for this demonstration were provided by Holzinger & Swineford in their 1939 monograph A Study in Factor Analysis: The Stability of a Bi-Factor Solution. The sample includes 301 7th and 8th grade students, between 11-16 years of age, drawn from two schools. The data is in the text file hs.dat and the accompanying R-script file is ch04.R. The variables in the data set that we will use are
visperc visual perception test in which participants select the next image in a series
cubes visual perception test in which participants must mentally rotate a cube
lozenges visual perception test involving mental “flipping” of a parallelogram (“lozenge”)
parcomp paragraph comprehension test
sencomp sentence completion task in which participants select most appropriate word to put at the end of a sentence
wordmean verbal ability test in which participants must select a word most similar in meaning to a word used in a sentence.
addition participants have 2 minutes to complete as many 2-number addition problems as they can
countdot participants have 4 minutes to count the number of dots in each of a series of dot pictures
sccaps participants have 3 minutes to indicate whether capital letters are composed entirely of straight lines or include curved lines.
Other variables in the data not included in the models fit here are school, female, age, and month.
hs <- read.table("data/hs.dat", header=FALSE)
names(hs) <- c("school", "female", "age", "month", "visperc", "cubes", "lozenges", "parcomp", "sencomp", "wordmean", "addition", "countdot", "sccaps")
rescaled the variables addition, countdot, and sccaps by dividing by four: This was done to facilitate model estimation, as numerical instability problems can arise when using variables on widely differing scales. Dividing these variables by four brings their standard deviations closer to the standard deviations of the other observed variables.
hs$addition <- hs$addition/4
hs$countdot <- hs$countdot/4
hs$sccaps <- hs$sccaps/4
Finally, we load the lavaan package that we will use to fit the CFA models:
cfa function in lavaan imposes a number of defaults consistent with how CFA models are typically specified and enables us to specify models with compact model syntax objects. The default scaling for the latent variables is a hybrid of the approaches we described in lecture. The default scaling for the latent variables is a hybrid of the approaches we described in lecture. By default, the latent variables will have means set to zero but freely estimated variance. Additionally, the factor loading of the first indicator for each latent variable will be set to one but the intercept for this indicator will be freely estimated. We will override these defaults to implement the scaling options described in lecture. in defining the model syntax object, we implement the =~ operator, which is used to define latent variables (left hand side) and to specify the variables that load on them (right hand side). Thus, visual =~ visperc + cubes + lozenges defines a new latent variable, visual, and indicates that visperc, cubes, and lozenges are indicator variables.
cfa.1a <-'#factor loadings
visual =~ visperc + cubes + lozenges
verbal =~ parcomp + sencomp + wordmean
speed =~ addition + countdot + sccaps
'
fit.1a <- cfa(cfa.1a, data=hs, meanstructure=TRUE, std.lv=TRUE)
summary(fit.1a, fit.measures=TRUE, standardized=TRUE, rsquare=TRUE)
lavaan 0.6-12 ended normally after 28 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 30
Number of observations 301
Model Test User Model:
Test statistic 85.306
Degrees of freedom 24
P-value (Chi-square) 0.000
Model Test Baseline Model:
Test statistic 918.852
Degrees of freedom 36
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.931
Tucker-Lewis Index (TLI) 0.896
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -8326.241
Loglikelihood unrestricted model (H1) -8283.589
Akaike (AIC) 16712.483
Bayesian (BIC) 16823.696
Sample-size adjusted Bayesian (BIC) 16728.553
Root Mean Square Error of Approximation:
RMSEA 0.092
90 Percent confidence interval - lower 0.071
90 Percent confidence interval - upper 0.114
P-value RMSEA <= 0.05 0.001
Standardized Root Mean Square Residual:
SRMR 0.060
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
visual =~
visperc 5.398 0.485 11.128 0.000 5.398 0.772
cubes 1.992 0.310 6.429 0.000 1.992 0.424
lozenges 5.249 0.595 8.817 0.000 5.249 0.581
verbal =~
parcomp 2.969 0.170 17.474 0.000 2.969 0.852
sencomp 4.406 0.251 17.576 0.000 4.406 0.855
wordmean 6.416 0.376 17.082 0.000 6.416 0.838
speed =~
addition 3.562 0.400 8.903 0.000 3.562 0.570
countdot 3.655 0.330 11.090 0.000 3.655 0.723
sccaps 6.030 0.585 10.305 0.000 6.030 0.665
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
visual ~~
verbal 0.459 0.064 7.189 0.000 0.459 0.459
speed 0.471 0.073 6.461 0.000 0.471 0.471
verbal ~~
speed 0.283 0.069 4.117 0.000 0.283 0.283
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.visperc 29.615 0.403 73.473 0.000 29.615 4.235
.cubes 24.352 0.271 89.855 0.000 24.352 5.179
.lozenges 18.003 0.521 34.579 0.000 18.003 1.993
.parcomp 9.183 0.201 45.694 0.000 9.183 2.634
.sencomp 17.362 0.297 58.452 0.000 17.362 3.369
.wordmean 15.299 0.441 34.667 0.000 15.299 1.998
.addition 24.069 0.360 66.766 0.000 24.069 3.848
.countdot 27.635 0.291 94.854 0.000 27.635 5.467
.sccaps 48.367 0.523 92.546 0.000 48.367 5.334
visual 0.000 0.000 0.000
verbal 0.000 0.000 0.000
speed 0.000 0.000 0.000
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.visperc 19.766 4.090 4.833 0.000 19.766 0.404
.cubes 18.141 1.628 11.146 0.000 18.141 0.821
.lozenges 54.037 5.800 9.317 0.000 54.037 0.662
.parcomp 3.341 0.429 7.779 0.000 3.341 0.275
.sencomp 7.140 0.934 7.642 0.000 7.140 0.269
.wordmean 17.454 2.109 8.277 0.000 17.454 0.298
.addition 26.430 2.691 9.823 0.000 26.430 0.676
.countdot 12.192 1.855 6.573 0.000 12.192 0.477
.sccaps 45.857 5.730 8.003 0.000 45.857 0.558
visual 1.000 1.000 1.000
verbal 1.000 1.000 1.000
speed 1.000 1.000 1.000
R-Square:
Estimate
visperc 0.596
cubes 0.179
lozenges 0.338
parcomp 0.725
sencomp 0.731
wordmean 0.702
addition 0.324
countdot 0.523
sccaps 0.442
The modification indices for the model are obtained from the modindices function as follows:
modindices(fit.1a, sort.=TRUE, minimum.value=10)
Here we see the largest modification indices are associated with a cross-loading for sccaps on visual, and a correlated uniqueness for countdot with addition. It is important to keep in mind that both modification indices may be related to the same misspecification.
Initial Model with Scaling Items: model fit is the same
We shall choose the first indicator for each factor to be the scaling item. The intercept and factor loading for each scaling item is set to zero and one, respectively. This scaling option permits the means and variances of the latent factors to be estimated.
cfa.1b <-'#factor loadings
visual =~ visperc + cubes + lozenges
verbal =~ parcomp + sencomp + wordmean
speed =~ addition + countdot + sccaps
#means/intercepts
visual ~ 1
verbal ~ 1
speed ~ 1
visperc ~ 0*1
parcomp ~ 0*1
addition ~ 0*1
'
fit.1b <- cfa(cfa.1b, data=hs, meanstructure=TRUE)
#summary(fit.1b, fit.measures=TRUE, standardized=TRUE, rsquare=TRUE)
cfa.2 <- '#factor loadings
visual =~ visperc + cubes + lozenges + sccaps
verbal =~ parcomp + sencomp + wordmean
speed =~ addition + countdot + sccaps
'
fit.2 <- cfa(cfa.2, data=hs, meanstructure=TRUE, std.lv=TRUE)
#summary(fit.2, fit.measures=TRUE, standardized=TRUE, rsquare=TRUE)
lavTestLRT(fit.2, fit.1a)
Chi-Squared Difference Test
Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
fit.2 23 16682 16796 52.382
fit.1a 24 16712 16824 85.305 32.923 1 9.586e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
modindices(fit.2, sort.=TRUE, minimum.value=10)
Note that no further changes are suggested for the model.
cfa.3 <- '#factor loadings
visual =~ visperc + cubes + lozenges
verbal =~ parcomp + sencomp + wordmean
speed =~ addition + countdot + sccaps
#covariances
addition ~~ countdot
'
fit.3 <- cfa(cfa.3, data=hs, meanstructure=TRUE, std.lv=TRUE)
#summary(fit.3, fit.measures=TRUE, standardized=TRUE, rsquare=TRUE)
lavTestLRT(fit.3, fit.1a)
Chi-Squared Difference Test
Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
fit.3 23 16682 16797 53.272
fit.1a 24 16712 16824 85.305 32.033 1 1.516e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
modindices(fit.3, sort.=TRUE, minimum.value=10)
This model is not nested with the alternative modified model; however, the two models provide nearly equivalent fit (RMSEA=.065 versus .066, TLI=.948 versus .946, etc.). Thus, model selection should be based on which modification is most plausible, and upon the interpretability of parameter estimates.
Measurement invariance (MI) refers to extent to which same measurements conducted under different conditions yield same measures of attributes under study. Longitudinal invariance is said to exist if the distribution of the observed variables depends only on the level of the latent variable and not also on the time point at which it was measured f(y|\(\eta\),t)= f(y|\(\eta\))
The latent growth curve is capturing the growth as a latent factor. The latent curve model (LCM) is fundamentally a highly restricted CFA model. The reduced-form equation is \(y_i= v +\Lambda(\alpha+\zeta_i) + \varepsilon_i\)
Time invariate predictor, predict both intercept and slope change structural model: add \(\Gamma x_i\) into it. \(y_i= v +\Lambda(\alpha+\Gamma x_i+\zeta_i) + \varepsilon_i\)
while time-invariant
covariates add predictors into structual model, time-varying covariates
add predictors into measurement model. add \(Kz_i\). \(y_i= v
+\Lambda(\alpha+\zeta_i) +Kz_i+ \varepsilon_i\)
lme {nlme}
function is flexible in autocorrelation structure
summary(lme(AIQ~1 + SZ +T1+T2+T3 + SZ:T1 + SZ:T2 + SZ:T3, random = ~T1+T2|ID, data = cog_pre_long, na.action = na.omit,
correlation = corCompSymm(form = ~ 1 | ID),method = "ML",
control =list(msMaxIter = 1000, msMaxEval = 1000)))
lmer {lme4}
summary(lmer(AIQ ~ 1 + SZ +T1+T2+T3 + SZ:T1 + SZ:T2 + SZ:T3 + (1+T1+T2|ID), data=cog_dat_long))